Open architecture for multilingual parallel texts

نویسنده

  • Manuel Carrasco-Benitez
چکیده

1. Abstract Multilingual parallel texts (abbreviated to parallel texts) are linguistic versions of the same content (" translations "); e.g., the Maastricht Treaty in English and Spanish are parallel texts. This document is about creating an open architecture for the whole Authoring, Translation and Publishing Chain (ATP-chain) for the processing of parallel texts. • Public discussion: with an emailing list (or similar) to reach a rough consensus, in particular on aspects such as the required specifications. Organise the necessary meeting(s). • Tools: implement some tools. This might be done simultaneously with the public discussion to better illustrate the approach and support the discussion. To obtain the best quality, speed and the lowest possible cost (QSC) in the production of parallel texts, one should aim for: • Generating all the linguistic versions ready for publication, from linguistic resources. Probably one of the best approaches. • Seamless ATP-chain implementations. • Authoring: Computer-aided authoring (CAA) tools with a controlled authoring environment; it should deliver source texts prepared for translation. • Translation: Computer-aided translation (CAT) tools to allow translators to focus only in translating and unburden translators from auxiliary tasks such as formatting. These tools should have functionalities such as side-by-side editor and the re-use of previous translations. • Publishing: Computer-aided publishing (CAP) tools to minimise human intervention. The open architecture (i.e., based on open standards) must allow the proper interoperability of programs from different software producers. It must be possible to implement client tools that give the impression to the users (e.g., authors, translators) of interfacing with one seamless system. These client applications might be interacting (through open standards) with many application servers that might be from different software producers; hence the complexity can be hidden from the users.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a multilingual parallel corpus for human users

We present the architecture and the current state of InterCorp, a multilingual parallel corpus centered around Czech, intended primarily for human users and consisting of written texts with a focus on fiction. Following an outline of its recent development and a comparison with some other multilingual parallel corpora we give an overview of the data collection procedure that covers text selecti...

متن کامل

Development of Circulating Support Environment of Multilingual Medical Communication using Parallel Texts for Foreign Patients

The need for multilingual communication in Japan has increased due to an increase in the number of foreigners in the country. When people communicate in their nonnative language, the differences in language prevent mutual understanding among the communicating individuals. In the medical field, communication between the hospital staff and patients is a serious problem. Currently, medical transla...

متن کامل

The Multilingual Affective Soccer Corpus (MASC): Compiling a biased parallel corpus on soccer reportage in English, German and Dutch

The emergence of the internet has led to a whole range of possibilities to not only collect large, but also highly specified text corpora for linguistic research. This paper introduces the Multilingual Affective Soccer Corpus. MASC is a collection of soccer match reports in English, German and Dutch. Parallel texts are collected manually from the involved soccer clubs’ homepages with the aim of...

متن کامل

A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps

With the increasing amount of multilingual texts in the Internet, multilingual text retrieval techniques have become an important research issue. However, the discovery of relationships between different languages remains an open problem. In this paper we propose a method, which applied the growing hierarchical self-organizing map (GHSOM) model, to discover knowledge from multilingual text docu...

متن کامل

Multilingual Lexical Database Generation from parallel texts with endogenous resources

This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0808.3889  شماره 

صفحات  -

تاریخ انتشار 2008